Retail Price Optimization¶
Retail price optimization involves determining the optimal selling price for products or services to maximize revenue and profit. Optimizing retail prices means finding the perfect balance between the price you charge for your products and the number of products you can sell at that price. The ultimate aim is to charge a price that helps you make the most money and attracts enough customers to buy your products. It involves using data and pricing strategies to find the right price that maximizes your sales and profits while keeping customers happy.
Let’s start the task of Retail Price Optimization by importing the necessary Python libraries and the dataset:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
data = pd.read_csv('retail_price.csv')
print(data.head())
product_id product_category_name month_year qty total_price \
0 bed1 bed_bath_table 01-05-2017 1 45.95
1 bed1 bed_bath_table 01-06-2017 3 137.85
2 bed1 bed_bath_table 01-07-2017 6 275.70
3 bed1 bed_bath_table 01-08-2017 4 183.80
4 bed1 bed_bath_table 01-09-2017 2 91.90
freight_price unit_price product_name_lenght product_description_lenght \
0 15.100000 45.95 39 161
1 12.933333 45.95 39 161
2 14.840000 45.95 39 161
3 14.287500 45.95 39 161
4 15.100000 45.95 39 161
product_photos_qty ... comp_1 ps1 fp1 comp_2 ps2 \
0 2 ... 89.9 3.9 15.011897 215.000000 4.4
1 2 ... 89.9 3.9 14.769216 209.000000 4.4
2 2 ... 89.9 3.9 13.993833 205.000000 4.4
3 2 ... 89.9 3.9 14.656757 199.509804 4.4
4 2 ... 89.9 3.9 18.776522 163.398710 4.4
fp2 comp_3 ps3 fp3 lag_price
0 8.760000 45.95 4.0 15.100000 45.90
1 21.322000 45.95 4.0 12.933333 45.95
2 22.195932 45.95 4.0 14.840000 45.95
3 19.412885 45.95 4.0 14.287500 45.95
4 24.324687 45.95 4.0 15.100000 45.95
[5 rows x 30 columns]
Before moving forward, let’s have a look if the data has null values or not:
print(data.isnull().sum())
product_id 0 product_category_name 0 month_year 0 qty 0 total_price 0 freight_price 0 unit_price 0 product_name_lenght 0 product_description_lenght 0 product_photos_qty 0 product_weight_g 0 product_score 0 customers 0 weekday 0 weekend 0 holiday 0 month 0 year 0 s 0 volume 0 comp_1 0 ps1 0 fp1 0 comp_2 0 ps2 0 fp2 0 comp_3 0 ps3 0 fp3 0 lag_price 0 dtype: int64
Now let’s have a look at the descriptive statistics of the data:
print(data.describe())
qty total_price freight_price unit_price \
count 676.000000 676.000000 676.000000 676.000000
mean 14.495562 1422.708728 20.682270 106.496800
std 15.443421 1700.123100 10.081817 76.182972
min 1.000000 19.900000 0.000000 19.900000
25% 4.000000 333.700000 14.761912 53.900000
50% 10.000000 807.890000 17.518472 89.900000
75% 18.000000 1887.322500 22.713558 129.990000
max 122.000000 12095.000000 79.760000 364.000000
product_name_lenght product_description_lenght product_photos_qty \
count 676.000000 676.000000 676.000000
mean 48.720414 767.399408 1.994083
std 9.420715 655.205015 1.420473
min 29.000000 100.000000 1.000000
25% 40.000000 339.000000 1.000000
50% 51.000000 501.000000 1.500000
75% 57.000000 903.000000 2.000000
max 60.000000 3006.000000 8.000000
product_weight_g product_score customers ... comp_1 \
count 676.000000 676.000000 676.000000 ... 676.000000
mean 1847.498521 4.085503 81.028107 ... 79.452054
std 2274.808483 0.232021 62.055560 ... 47.933358
min 100.000000 3.300000 1.000000 ... 19.900000
25% 348.000000 3.900000 34.000000 ... 49.910000
50% 950.000000 4.100000 62.000000 ... 69.900000
75% 1850.000000 4.200000 116.000000 ... 104.256549
max 9750.000000 4.500000 339.000000 ... 349.900000
ps1 fp1 comp_2 ps2 fp2 comp_3 \
count 676.000000 676.000000 676.000000 676.000000 676.000000 676.000000
mean 4.159467 18.597610 92.930079 4.123521 18.620644 84.182642
std 0.121652 9.406537 49.481269 0.207189 6.424174 47.745789
min 3.700000 0.095439 19.900000 3.300000 4.410000 19.900000
25% 4.100000 13.826429 53.900000 4.100000 14.485000 53.785714
50% 4.200000 16.618984 89.990000 4.200000 16.811765 59.900000
75% 4.200000 19.732500 117.888889 4.200000 21.665238 99.990000
max 4.500000 57.230000 349.900000 4.400000 57.230000 255.610000
ps3 fp3 lag_price
count 676.000000 676.000000 676.000000
mean 4.002071 17.965007 107.399684
std 0.233292 5.533256 76.974657
min 3.500000 7.670000 19.850000
25% 3.900000 15.042727 55.668750
50% 4.000000 16.517110 89.900000
75% 4.100000 19.447778 129.990000
max 4.400000 57.230000 364.000000
[8 rows x 27 columns]
Now let’s have a look at the distribution of the prices of the products:
plt.figure(figsize=(8, 5))
plt.hist(data['total_price'], bins=20, color='cornflowerblue', edgecolor='black')
# Add labels and title
plt.title('Distribution of Total Price', fontsize=14)
plt.xlabel('Total Price')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
Now let’s have a look at the distribution of the unit prices using a box plot:
plt.figure(figsize=(6, 5))
plt.boxplot(data['unit_price'], patch_artist=True,
boxprops=dict(facecolor='lightblue', color='black'),
medianprops=dict(color='red'),
whiskerprops=dict(color='black'),
capprops=dict(color='black'))
# Add title and labels
plt.title('Box Plot of Unit Price', fontsize=14)
plt.ylabel('Unit Price')
plt.tight_layout()
plt.show()
Now let’s have a look at the relationship between quantity and total prices:
plt.figure(figsize=(8, 6))
# Scatter plot with regression line
sns.regplot(data=data, x='qty', y='total_price', scatter_kws={'color': 'blue'}, line_kws={'color': 'red'})
# Labels and title
plt.title('Quantity vs Total Price', fontsize=14)
plt.xlabel('Quantity')
plt.ylabel('Total Price')
plt.tight_layout()
plt.show()
Thus, the relationship between quantity and total prices is linear. It indicates that the price structure is based on a fixed unit price, where the total price is calculated by multiplying the quantity by the unit price.
Now let’s have a look at the average total prices by product categories:
# Calculate average total price by category
avg_price_by_category = data.groupby('product_category_name')['total_price'].mean().reset_index()
plt.figure(figsize=(12, 6))
sns.barplot(data=avg_price_by_category, x='product_category_name', y='total_price', palette='pastel')
# Rotate x-axis labels for readability
plt.xticks(rotation=45, ha='right')
# Labels and title
plt.title('Average Total Price by Product Category', fontsize=14)
plt.xlabel('Product Category')
plt.ylabel('Average Total Price')
plt.tight_layout()
plt.show()
C:\Users\chaitna\AppData\Local\Temp\ipykernel_3632\35384386.py:5: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(data=avg_price_by_category, x='product_category_name', y='total_price', palette='pastel')
Now let’s have a look at the distribution of total prices by weekday using a box plot:
plt.figure(figsize=(10, 6))
sns.boxplot(data=data, x='weekday', y='total_price', palette='pastel')
# Labels and title
plt.title('Box Plot of Total Price by Weekday', fontsize=14)
plt.xlabel('Weekday')
plt.ylabel('Total Price')
plt.tight_layout()
plt.show()
C:\Users\chaitna\AppData\Local\Temp\ipykernel_3632\3637933597.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.boxplot(data=data, x='weekday', y='total_price', palette='pastel')
Now let’s have a look at the distribution of total prices by holiday using a box plot:
plt.figure(figsize=(8, 6))
sns.boxplot(data=data, x='holiday', y='total_price', palette='Set2')
# Add title and axis labels
plt.title('Box Plot of Total Price by Holiday', fontsize=14)
plt.xlabel('Holiday')
plt.ylabel('Total Price')
plt.tight_layout()
plt.show()
C:\Users\chaitna\AppData\Local\Temp\ipykernel_3632\835259370.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.boxplot(data=data, x='holiday', y='total_price', palette='Set2')
Now let’s have a look at the correlation between the numerical features with each other:
# Only keep numeric columns
numeric_data = data.select_dtypes(include='number')
# Compute correlation matrix
correlation_matrix = numeric_data.corr()
# Set up the figure size
plt.figure(figsize=(12, 10))
# Draw heatmap without annotations for a clean look
sns.heatmap(correlation_matrix, cmap='plasma', center=0, square=True,
cbar_kws={"shrink": 0.8}, linewidths=0.5)
# Titles and axis settings
plt.title('Correlation Heatmap of Numerical Features', fontsize=16)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()
Analyzing competitors’ pricing strategies is essential in optimizing retail prices. Monitoring and benchmarking against competitors’ prices can help identify opportunities to price competitively, either by pricing below or above the competition, depending on the retailer’s positioning and strategy. Now let’s calculate the average competitor price difference by product category:
# Calculate price difference
data['comp_price_diff'] = data['unit_price'] - data['comp_1']
# Compute average difference by product category
avg_price_diff_by_category = data.groupby('product_category_name')['comp_price_diff'].mean().reset_index()
# Plot using seaborn
plt.figure(figsize=(12, 6))
sns.barplot(data=avg_price_diff_by_category.sort_values('comp_price_diff', ascending=False),
x='product_category_name', y='comp_price_diff', palette='viridis')
# Title and labels
plt.title('Average Competitor Price Difference by Product Category', fontsize=14)
plt.xlabel('Product Category')
plt.ylabel('Average Competitor Price Difference')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
C:\Users\chaitna\AppData\Local\Temp\ipykernel_3632\2499637640.py:9: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(data=avg_price_diff_by_category.sort_values('comp_price_diff', ascending=False),
Now let’s train a Machine Learning model for the task of Retail Price Optimization. Below is how we can train a Machine Learning model for this problem:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
X = data[['qty', 'unit_price', 'comp_1',
'product_score', 'comp_price_diff']]
y = data['total_price']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=42)
# Train a linear regression model
model = DecisionTreeRegressor()
model.fit(X_train, y_train)
DecisionTreeRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeRegressor()
Now let’s make predictions and have a look at the predicted retail prices and the actual retail prices:
import plotly.graph_objects as go
import numpy as np
y_pred = model.predict(X_test)
# Ensure y_test and y_pred are NumPy arrays (optional)
y_test = np.array(y_test)
y_pred = np.array(y_pred)
# Create scatter plot
fig = go.Figure()
# Add predicted vs actual points
fig.add_trace(go.Scatter(
x=y_test, y=y_pred, mode='markers',
marker=dict(color='blue', opacity=0.6),
name='Predicted vs Actual'
))
# Add ideal prediction line (y = x)
fig.add_trace(go.Scatter(
x=[min(y_test), max(y_test)],
y=[min(y_test), max(y_test)],
mode='lines',
line=dict(color='red', dash='dash'),
name='Ideal Prediction'
))
# Customize layout
fig.update_layout(
title='Predicted vs. Actual Retail Price',
xaxis_title='Actual Retail Price',
yaxis_title='Predicted Retail Price',
height=600,
width=800
)
fig.show()
The ultimate aim of optimizing retail prices is to charge a price that helps you make the most money and attracts enough customers to buy your products. It involves using data and pricing strategies to find the right price that maximizes your sales and profits while keeping customers happy.